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ABSTRACT 

Transcription factor Kruppel-like factor 4 (Klf4), one 
of the factors directing cellular reprogramming, rec- 
ognizes the CpG dinucleotide (whether methylated 
or unmodified) within a specific G/C-rich sequence. 
The binding affinity of the mouse Klf4 DNA-binding 
domain for methylated DNA is only slightly stronger 
than that for an unmodified oligonucleotide. The 
structure of the C-terminal three Kruppel-like zinc 
fingers (ZnFs) of mouse Klf4, in complex with fully 
methylated DNA, was determined at 1.85 A reso- 
lution. An arginine and a glutamate interact with 
the methyl group. By comparison with two other 
recently characterized structures of ZnF protein 
complexes with methylated DNA, we propose a 
common principle of recognition of methylated 
CpG by C2H2 ZnF proteins, which involves a spa- 
tially conserved Arg-Glu pair. 

INTRODUCTION 

The control of gene expression in mammals relies in part on 
the modification status of DNA cytosine residues, which 
exist in at least five forms: cytosine (C), 5-methylcytosine 
(5mC), 5-hydroxymethylcytosine (5hmC), 5-formyl- 
cytosine (5fC) and 5-carboxylcytosine (5caC) (1-4). DNA 
methyltransferases methylate cytosines in the context (pri- 
marily) of CpG dinucleotides, generating 5mC in the 
genome (5,6). Ten-eleven translocation (Tet) dioxygenases 
convert 5mC to 5hmC, 5fC and 5caC in three consecutive 
oxidation reactions (7-10). The exact functions of these 
oxidized cytosine bases are under investigation. 

The cytosine modifications can be 'interpreted' or 'read' 
by effector (or reader) molecules. There are currently three 
best-known classes of mammahan proteins containing 
domains that recognize modified DNA. The first class 



includes methyl-binding domains that recognize methyla- 
ted cytosine in fully methylated CpG dinucleotides (11). 
The second class includes Su(var)3-9, En(zeste), and 
Trithorax (SET) and really interesting new gene (RING) 
finger-associated domains that recognize hemimethylated 
CpG sites — transiently generated during DNA replication 
and methylated on the parental strand only (12). The third 
class of mammalian proteins that recognize methylated 
DNA is the C2H2 zinc finger (ZnF) proteins that prefer- 
entially bind to methylated CpG within a longer specific 
DNA sequence (13). This unique feature of ZnF proteins 
is important in that 'sequences longer than CpG would be 
necessary for the regulation of gene expression by methy- 
lation' (14). 

Recently, ZnF DNA-binding domains from two 
proteins, Kaiso and Zfp57, were structurally analyzed in 
complex with their respective methylated DNA elements 
(15,16). Here we analyze the interaction of transcription 
factor Kriippel-Hke factor 4 (Klf4) with its target 
methylated DNA element. By comparing three examples 
of ZnF-methylated DNA interactions, we have derived an 
apparent consensus sequence motif associated with recog- 
nition of methylated CpG elements. 

Klf4 is one of 26 members of the specificity protein/ 
Kriippel-like factor (Sp/Klf) family of ZnF transcription 
factors (17-19) and is one of the four Yamanaka 
reprogramming factors (20). Two recent studies suggested 
Klf4 binds specific methylated and/or unmethylated 
elements. Using a DNA pull-down approach combined 
with quantitative mass spectrometry, three Klf proteins 
(Klf2, Klf4 and Klf 5) were identified as 5mC readers in 
mouse embryonic stem (ES) cells (21). Using a protein 
microarray-based approach, 47 human transcription 
factors including human KLF4 could bind to methylated 
CpG sites (22). Both mouse Klf4 and human KLF4 
proteins share an identical DNA-binding do- 
main composed of three standard Kriippel-Hke ZnFs 
(Figure la). The consensus-binding element for Klf4 was 
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Figure 1. Klf4 binds methylated CpG. (a) Sequence alignment of the C-terminal ZnF DNA-binding domains of mouse Klf4 (mKlf4) and human 
KLF4 (hKLF4), which are identical in sequence. The mutations made by Hu et al. (22), R458A and D460A of hKLF4 are located in the last (third) 
ZnF, which does not directly participate in methyl-CpG binding, (b) Schematic representation of mKlf4 DNA-binding ZnF domain. The sequence 
and the secondary structure are shown as follows: (arrows) P strands and (ribbons) a hehces. The positions highlighted are responsible for Zn hgand 
binding (C2H2) and DNA base-specific interactions at —1, —4, —5 and —7 positions (relative to the first zinc-binding histidine): sohd fines (direct 
hydrogen bonds) and dashed fines (van der Waals contacts). The DNA sequence used for the study is shown with the majority of base interactions 
involving the top strand from 3'-to-5' (left-to-right). The central GCG sequence is colored in magenta and the letter 'm' indicates the methyl group in 
5mC. Dotted and sofid vertical fines indicate specific binding interactions, (c) The mKlf4 ZnF protein binds in the major groove of DNA with ZnFl 
(blue), ZnF2 (green) and ZnF3 (pink), (d) Lys413 of ZnFl at the —7 position interacts with the 06 oxygen atoms of both guanines at G9 (of upper 
strand) and GIO (of lower strand), (e) His416 of ZnFl at the —4 position interacts with the TpG dinucleotide. (f) Arg443-Gua6 interaction; a layer 
of ordered water molecules (marked 'w') shields the methyl group of lower strand 5mC. (g) The upper strand 5mCpG interacts with Arg443 and 
forms a 5mC-Arg-Gua triad, (h) One of the carboxylate oxygen atoms of Glu446 forms a weak C-H. . .O type of hydrogen bond with the methyl 
group of the upper strand 5mC. (i) Asp445 of ZnF2 at the —5 position interacts with Arg443 at the —7 position and the N4 atom of 5mC of the 
upper strand, (j) Arg449-G4 interaction, (k) Arg471-G3 interaction; Asp473 of ZnF3 at the —5 position interacts with Arg471 at the —7 position 
and the N4 atom of Cyt4 of the lower strand. (1) His474-A2 interaction, (m and n) Structural comparison of mKlf4 Glu446 in the absence (m) and 
presence of methylation (n). 
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determined by both base-specific mutagenesis [5^-(A/ 
G)(G/A)GG (C/T)G (C/T)-3n (18) and chromatin 
immunoprecipitation sequencing (ChlP-seq) [5^-GGG(C/ 
T)G(T/G)GG-3n (23). These both share a central GG(C/ 
T)G, which contains either CpG, which can be 
methylated, or TpG, which is intrinsically methylated on 
one strand and can be methylated on the other strand 
(CpA) by DNA methyltransferase 3a (Dnmt3a) (24,25). 
In other words, as with Kaiso (see below), TpG can substi- 
tute for the ( ± methyl)-CpG dinucleotide in the consensus 
sequences, and has a methyl group in the same position as 
methylated C (5-carbon of the pyrimidine). Here we 
analyze the Klf4 interactions with methylated DNA 
both structurally and biochemically. 

MATERIALS AND METHODS 

Protein expression and purification 

For mouse Klf4, Glutathione S-transferase (GST)-tagged 
Klf4 fragment (residues 396-483; pXC1248) and its 
mutant Glu446-to-alanine (E446A; pXC1257) were 
cloned into pGEX6P-l vector and expressed in 
Escherichia coli BL21-CodonPlus(DE3)-RIL 

(Stratagene). Bacterial cells were cultured at 37°C in the 
Luria-Bertani medium and induced for protein expression 
with 0.2 mM isopropyl (3-D-l-thiogalactopyranoside at 
16°C overnight. The bacteria were harvested and 
lysed by sonication in 20 mM Tris-HCl (pH 7.5), 
250 mM NaCl, 5% (v/v) glycerol and 0.5 mM tris 
(2-carboxyethyl)phosphine (TCEP), followed by centrifu- 
gation for 35 min at 18 000 rpm (SA-300 rotor). After puri- 
fication on Glutathione Sepharose 4B (GE Healthcare), 
the GST tag on the recombinant protein was removed 
by PreScission protease (purified in-house), resulting in 
the additional N-terminal residues Gly-Pro-Leu-Gly-Ser 
(GPLGS) relative to the wild type (WT) sequence. 
Protein was further purified on HiTrap-Q, HiTrap-SP 
and Superdex-200 (16/60) (GE Healthcare) and 
concentrated to ~20mgml~^ in 20 mM Tris-HCl (pH 
7.5), 200 mM NaCl, 5% (v/v) glycerol and 0.5 mM 
TCEP. The yield of the mutant E446A protein was 
~10% that of the wild-type protein. 

Crystallography 

The purified Klf4 protein was incubated with annealed 
oligonucleotides at an equimolar ratio for 0.5 h on ice 
before crystallization. The final solution contained 
0.8 mM protein-DNA complex. Crystals were obtained 
by the sitting-drop method; the mother Hquor contained 
100 mM Tris-HCl (pH 8.5), 250 mM NaCl and 20% poly- 
ethylene glycol 8000. Crystals grew within 3 days at 16°C. 

The crystals were flash frozen by plunging into liquid 
nitrogen. X-ray diffraction data were collected at the 
SER-CAT beamline at the Advanced Photon Source, 
Argonne National Laboratory. HKL2000 (26) and 
CCP4 packages (27) were used for the data processing. 
The structure was solved by molecular replacement with 
the coordinates of 2WBU (28) as an initial searching 
model using the PHENIX (29) and Phaser programs 
(30). Model refinement was performed with COOT (31) 



and PHENIX. Molecular graphics were generated 
with the Pymol program (DeLano Scientific LLC). The 
Dali server (32) was used for determining the root- 
mean-squared deviations of different structures. 

DNA-binding assay by fluorescence polarization 

Fluorescence polarization assays for Klf4 DNA binding 
were performed in 20 mM Tris-HCl (pH 7.5), 150mM 
NaCl, 5% (v/v) glycerol and 0.5 mM TCEP at room tem- 
perature (~22°C) using a Synergy 4 Micr opiate Reader 
(BioTek). Fluorescently labeled double-stranded DNA 
probe (10 nM for WT or 1 nM for the E446A mutant) 
and various amounts of Klf4 protein, with a final 
volume of 50 |il, were incubated in a 384-well plate for 
0.5 h before measurement. The sequences of 6-carboxy- 
fluorescein (FAM)-labeled double-stranded oligonucleo- 
tides were FAM-5^-TT GCC AYG CCT C-3^ and 
3^-CGG TGX GGA G-5^ (where Y = C or 5mC, and 
X = C, 5mC, 5hmC, 5fC or 5caC). The control DNA 
sequences were FAM-5^-GTT GCM GCG TG-3^ and 
3^-CAA CGG XGC AC-5^ Curves were fit individually 
using Origin 7.5 software (OriginLab). values 
were calculated as [mP] = [maximum mP] x [C]/ 
(KD + [C]) + [basehne mP], where [mP] is millipolarization 
and [C] is protein concentration. Averaged Kj) and its 
standard error were reported. We have found that the 
absolute magnitude of binding affinity by Klf4 is sensi- 
tive to the percentage of glycerol used in the reaction; 
therefore, absolute (not relative) values vary 

somewhat among experiments done at different times 
(Figure 2a and b). 

Genomic analysis of methylation profiles within 
Klf4-binding sites 

For the methylation profile in the mouse Klf4-binding 
sites in ES cells, the published Klf4 ChlP-seq (Gene 
Expression Omnibus (GEO) data set GSM288354) (23) 
and methylome data in both ES cells and neuronal pro- 
genitor (NP) cells (GEO data set GSE30202) (33) were 
used. We recorded the frequency of the methylation 
levels at the GGCG sites in the 60-nt windows centered 
around the midpoints of the reported ChlP-seq peak 
regions (Figure 2e). Of the 10 297 Klf4 ChlP-seq peak 
regions in the mouse ES cells, 5530 contain the GGCG 
motif. As a comparison, we randomly picked an equal 
number of 60-nt windows in the mouse ES genome and 
carried out the same analysis (Figure 2f). Similarly, for the 
methylation profiles in Klf4-binding sites in human HI 
cells, the pubHshed KLF4 ChlP-seq (GEO data set 
GSM447584) (34) and methylome data (GEO data set 
GSM432685) (34) were used. Mouse genome mm9 and 
human genome hgl9 were used in these analyses. 

RESULTS AND DISCUSSION 

Structure of Klf4 bound with methylated DNA 

Klf4 contains three tandem C2H2 fingers at its C-terminus 
(Figure la). A structure was previously determined for 
mouse Klf4 ZnF domain bound to the 10-bp 
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Figure 2. The effects of CpG modifications and DNA sequence on DNA-binding by Klf4. (a) Binding affinities measured by fluorescence polar- 
ization assays between Klf4 and DNA with five different modification states on the upper strand (5mC = M, C, 5hmC, 5fC and 5caC). (b) Binding 
affinities measured between Klf4 and DNA that is fuUy methylated, unmodified or hemimethylated (on either strand). For these experiments, only M 
(5mC) and C were used, (c) A GCG-containing DNA sequence partially matching the consensus binding element of Klf4 (underhned) was used as a 
negative control. Fluorescence polarization (FP) is measured in millipolarization (mP). (d) Binding affinities measured between the E446A variant of 
Klf4 and DNA having five different modification states on the top strand. In all cases, the lower strand has M (5mC). (e) Distribution of DNA CpG 
methylation in mouse ES cells and NP cells (33) that is present within the core GGCG Klf4-ChIP sites identified in ES cells (23). The red arrows 
indicate changes of methylation levels, from hypomethylation in mouse ES cells to hypermethylation in NP cells (see Table 2), during differentiation 
and/or reprogramming. (f) Distribution of DNA methylation of randomly chosen GGCG sites in the mouse ES genome, (g) Distribution of DNA 
methylation in human HI ES cells presented within the human KLF4-ChIP sites (34). GGCGTG sequences (green) have a higher proportion of 
methylated sites than do GGCGGG sequences. 



oligonucleotide (5^-GAGGCGTGGC-30 (28), which is 
present in the basic transcriptional element of CYPlAl 
gene (18). We modified this 10-bp DNA to contain a 
fully methylated CpG site and determined the complex 
structure of Klf4 at a resolution of 1.85 A (Table 1). 
Except for the side chain of Glu446 (see below), the 
overall structure of the Klf4 is essentially unchanged 
between complexes with methylated or unmodified 
DNA, with a root-mean-squared deviation of <0.5A 
when comparing 85 pairs of Ca atoms. The three ZnFs 
of Klf4 bind in the major groove of the DNA (Figure lb 
and c). ZnF3 interacts with the 5^ sequence (GAG), ZnF2 
interacts with the central (potentially)-methylated GCG 
and ZnFl interacts with the 3^ sequence (TGGC) 
(Figure lb and c; note, from left to right, the protein 
sequence runs from N to C termini, whereas the DNA 
sequence of the recognition strand runs from 3^ to 50- 

As with the Zfp57-DNA complex (16), the two 5mCs of 
the two DNA strands exhibit different patterns of inter- 
action with Klf4. A layer of ordered water molecules 
(marked 'w' in Figure If) envelops the methyl group of 
5mC on the bottom strand. In contrast, the methyl group 
of 5mC in the top strand makes van der Waals contacts 
with the guanidine group of Arg443, which in turn forms 
bifurcated hydrogen bonds with the 3^ guanine G6 
(Figure Ig), forming a 5mC-Arg-Gua triad (35). In 
addition, the 5mC methyl group interacts with the carb- 
oxylate group of Glu446, forming a weak (3.6 A) C-H. . .O 



type of hydrogen bond (Figure Ih) — a common but 
underappreciated interaction in biomolecules and molecu- 
lar recognition (36). 

Among the side chains involved in DNA base-specific 
interactions, Glu446 of Klf4 undergoes one of the largest 
conformational changes on binding methylated versus 
unmethylated CpG DNA. In the structure of the Klf4 
bound with unmodified DNA (28), the carboxylate 
group of Glu446 points away from C5 position of the 
cytosine, and forms a weak hydrogen bond with Arg449, 
which in turn interacts with the 5^ guanine G4 
(Figure Im). Superimposing the two structures reveals 
that Glu446 moves from the Arg449-interacting conform- 
ation to the 5mC-interacting conformation via a ~100° 
rotation of the side chain torsion angle /l (Figure In). 

Sequence and methyl-specific binding in solution 

To verify the structural observation of Klf4 binding to 
methylated GCG, we used fluorescence polarization 
analysis to measure the dissociation constants (Kd) 
between Klf4 fingers and double-stranded oligonucleo- 
tides containing a single CpG dinucleotide. Because 
contact with the 5mC methyl group on the bottom 
strand involves only water-mediated interactions 
(Figure If), we initially altered only the top strand, 
replacing the 5mC with unmodified cytosine (C) or three 
different oxidative modifications (5hmC, 5fC and 5caC), 
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Table 1. X-ray data collection and refinement statistics 



Protein 

DNA (M = 5mC) 

Beamline 

Wavelength (A) 

Total number of images 

Space group 
Cell dimensions 
a, b, c (A) 

A Y C). 
Resolution (A)"" 

Rmerge (%) 

<I >/a(I) 
Completeness (%)^ 
Redundancy^ 
Observed reflections 
Unique reflections^ 
Refinement 
Resolution (A) 
Number of reflections 

Rwork / Rfree (%) 

Number of atoms 
Protein 
DNA 
Water 
Others 

B-factors (A^) 
Protein 
DNA 
Water 
Others 

Root mean squared deviations 
Bond lengths (A) 
Bond angle (°) 



Klf4 

3'-CGGTGMGGAG-y 
5-GCCAMGCCTC-3' 
APS 22-BM 
1.000 

360 (1° rotation and 

3 s exposure) 
P432i2 

48.705, 48.705, 131.015 

90, 90, 90 

35-1.85 (1.92-1.85) 

0.076 (0.501) 

27.5 (1.8) 

97.7 (80.4) 

12.0 (3.9) 

166 500 

13918 (1103) 

34.44-1.85 
13 843 

18.68 / 23.39 

694 
438 
134 

13 (3 Zn^^, 1 acetate molecule, 

1 glycerol molecule) 
21.9 (overall) 
21.5 
20.1 
29.4 
30.6 

0.006 
1.313 



^Data for the highest-resolution shell are given in parentheses. 

with 5mC on the bottom strand in all cases. The binding 
affinity for fully methylated (M/M) DNA is slightly 
stronger than that of hemimethylated (C/M) DNA 
under the assay conditions (Figure 2a), although the dif- 
ference is only ^^40%. Each oxidation event, from 5mC to 
5hmC to 5fC to 5caC, resulted in progressively weaker 
binding (by factors of ~2, 3 and 6, respectively). We 
repeated these experiments with the same oligonucleotides 
either unmodified (C/C), hemimethylated on one strand 
(M/C or C/M) or fully methylated (M/M) (Figure 2b). 
Klf4 shows similar affinity for DNA with fully methylated 
(M/M) and hemimethylated on the top strand (M/C), with 
slightly reduced affinity by a factor of ~ 1.5 for unmodified 
DNA (C/C) and hemimethylated DNA on the bottom 
strand (C/M). 

Thus, methylation had significant, though modest, 
effects on binding. In contrast, mutating the sequence 
outside of the central GCG aboHshed Klf4 binding, 
regardless of the GCG modification status (Figure 2c). 
Together, these data indicate that the interaction 
between Klf4 and DNA depends largely on the specific 
sequence context and significantly but less profoundly 
on the cytosine modification state. This observation 
differs from that of Sprujit et al. who used a DNA pull- 
down with recombinant GST-Klf4-ZF domain, followed 
by western blotting against GST (21). Two examples with 



Table 2. Examples of hypomethylation in mouse ES cells and 
hypermethylation in NP cells within the same 5'-GGCGTG-3' 
sequences 



RefSeq ID 


Gene 


Chromosome 


CpG start 


ES 


NP 


NM 022312 


TnR 


chrl 


161743113 


0 


87.5 


NM 009271 


Src 


chr2 


157262347 


0 


100 


NM 001127367 


DnaJ 


chr5 


30064128 


4.3 


87.5 


NM 030719 


Gatsl2 


chr5 


134597736 


14.3 


100 


NM 175521 


Nyapl 


chr5 


138173048 


0 


92 


NM 016721 


Iqgapl 


chr7 


87926428 


18.2 


100 


NM 011858 


Tenm4 


chr7 


103642828 


18.2 


93.3 


NM 013875 


Pde7b 


chrlO 


20120229 


7.7 


90 


NM 172260 


Cep68 


chrll 


20131937 


14.3 


100 


NM 001039198 


Zfhx2 


chrl 4 


55690858 


0 


100 


NM 001253759 


Enoxl 


chrl 4 


77918843 


6.25 


100 


NM 134090 


Kdelr3 


chrl 5 


79349986 


12.5 


90.9 


NR 040470 


ncRNA 


chrl 7 


34042605 


0 


100 


NM 020625 


Zbtb22 


chrl 7 


34055672 


18.2 


100 


NM 175276 


Fhod3 


chrl 8 


25027364 


5 


86.2 



different DNA sequences were shown with a sequence 
containing four repeats of GAC and a sequence contain- 
ing three overlapping Klf4 consensus motifs. In both 
instances, western blots indicated the highest binding to 
oligonucleotides containing 5mC (21), even though the 
(GAC)4 sequence does not resemble the Klf4 consensus 
sequence. 

The structural results strongly implicate Glu446 in 
cytosine modification discrimination, so we replaced the 
negatively charged Glu446 of mouse Klf4 with alanine 
(E446A). The E446A mutant exhibited no detectable 
selectivity of methylated over oxidative derivatives (5fC 
and 5caC), though it maintained 5mC selectivity over un- 
modified and 5-hydroxymethylated cytosines (C and 
5hmC) (Figure 2d). This change in selectivity is not via 
decreased relative affinity for 5mC, but rather via an 
increase in the relative affinities for 5fC and 5caC. Like 
the corresponding glutamate residue in Zfp57 (37), the 
side chain of Glu446 in Klf4 (the size and the charge) is 
dispensable for methyl group recognition. Extensive sub- 
stitution study of the corresponding glutamate in Zfp57 
suggested that the negatively charged glutamate side chain 
carboxylate group might be critical in discriminating 
against the negatively charged carboxylate moiety of 
5caC (37). 

Using previously pubHshed data sets of Klf4 ChlP-seq 
profiles (23) and bisulphite-sequenced methylomes (33), 
we then examined the methylation status of the CpG site 
in the core Klf4 binding motif GGCG. Our analyses 
indicate that a substantial number of Klf4-binding sites 
in mouse (Figure 2e) and human ES cells (Figure 2g) are 
methylated (21). Importantly, many of the unmethylated 
Klf4-binding sites in mouse ES cells become 
hypermethylated in NP cells (33) (Figure 2e). Table 2 
Hsts 15 such sites containing the 5^-GGCGTG-3^ Klf4- 
binding sequence that exhibit both hypomethylation 
(<20%) in ES cells and hypermethylation (>80%) in NP 
cells. This suggests that Klf4 may bind methylated loci in 
differentiated cells (which should be tested by ChlP-seq of 
Klf4 in NP cells), and thereby initiate stem-cell- specific 
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gene expression patterns during reprogramming (21). For 
comparison, randomly chosen GGCG sites in mouse 
genomes are heavily methylated in both ES and NP cells 
(Figure 2f). Because the bisulphite sequencing method 
used to generate mouse methylomes in stem cells and 
NPs (33) does not distinguish between 5mC and 5hmC, 
or C between 5fC and 5caC (38), the exact modification 
status of these sites is unknown. All of these sites (Table 2) 
are located in the gene bodies, as are over half of the Klf4 
ChlP-seq sites. While promoter methylation strongly cor- 
relates with gene silencing, DNA (hydroxyl)methylation 
within the gene body is associated with gene activation 
(39), and the mechanisms of how gene body (hy- 
droxyl)methylation correlates with gene expression are 
currently under investigation (40). In this respect, it is 
noteworthy that a very recent study in mouse ES cells 
(41) indicates that the 5mC dioxygenease Tetl primarily 
affects 5hmC levels at gene promoters and transcription 
start sites, whereas Tet2 mainly modulates those levels in 
gene bodies. 

Structural comparison with Zfp57 and Kaiso 

As in previously characterized DNA-binding ZnF struc- 
tures (42), the DNA base contacts are made by the side 
chains in the N-terminal portion of the a helix, together 
with the residue immediately preceding the a helix. 
Because the first zinc-binding histidine (C2-4CX12H2.6H) 
is located almost always in the middle of the DNA recog- 
nition a helix and the spacing between Cys2 and His2 is 
constant (12 residues), we use the amino acids at positions 
— 1 to —8 (relative to the first zinc-binding histidine) in the 
following text to discuss the residues making base contact. 
This numbering scheme allows us to discuss from the per- 
spective of primary sequence without relying on the more 
variably spaced first position of the a helix. In the ZnF2 of 
Klf4, the arginine at the —1 position (RH) makes direct 
base contact to the 5^ Gua, the glutamate at —4 interacts 
with the central 5mC and the arginine at —7 recognizes the 
y Gua of methylated GCG (Figure 3a). 

Zfp57 recognizes the sequence GMGGCA [where 
M = 5mC; in the Hterature, the sequence of the opposite 
strand TGCCGC was initially used (43)]. Structural 
analysis of the complex between fully methylated DNA 
and the tandem two fingers of the mouse Zfp57 DNA- 
binding domain revealed that the methylated GCG 
sequence is read by the second ZnF using an arginine at 
the —1 position (RH), a glutamate at —4 and an arginine 
at —8 (Figure 3b). Pairwise comparison of the correspond- 
ing ZnFs of Klf4 and Zfp57 indicates that the Arg side 
chains, at the —7 position of Klf4 and —8 in Zfp57, are 
spatially superimposable (Figure 3c). 

Kaiso recognition sequences contain either a 
methylated CpG (44) or a TpG dinucleotide (45) (both 
have a methyl group at 5-carbon of the pyrimidine). 
Structures of the three-ZnF DNA-binding domain of 
Kaiso, in complex with its methylated CpG or TpG- 
containing cognate sequences, have been examined (15). 
Strikingly, interactions similar to those in Klf4 and Zfp57 
are observed: an arginine at the —1 position (RH) of ZnFl 
interacts with the y guanine of the 5mCpG or TpG 



dinucleotides, while a glutamate at the —5 position of 
ZnF2 interacts with 5mC (Figure 3d) or T. The fact that 
methyl-CpG sites within the consensus recognition se- 
quences of Klf4 and Kaiso can be substituted by TpG 
raises an intriguing possibility. Perhaps, TpG/CpA sites, 
which could be methylated by Dnmt3a (24), are selected 
for when it is advantageous for a particular DNA 
sequence to be treated as if it is a permanently 
(hemi)methylated version of the recognition sequence; 
the shared recognition mechanism for (5mC/T)pG 
provides that option. 

A noncontiguous Arg-Glu pair for methyl-CpG 
recognition 

The fact that all three ZnF proteins examined here use an 
arginine and glutamate pair to recognize the methyl group 
of 5mC (or thymine) implies that glutamate might be a 
favorable amino acid for recognizing 5mC (or T). 
However, there is one substantial difference: Glu446 of 
Klf4 makes a weak C-H. . .O type of hydrogen bond 
with the 5mC methyl group (Figure 3e), whereas Glul82 
of Zfp57 (like Glu335 of Kaiso) forms a van der Waals 
contact with the methyl group of 5mC, while one of its 
carboxylate oxygen atoms also interacts with the N4 atom 
of the same 5mC base (Figure 3f and g). This difference in 
interaction might explain the relatively small increase of 
binding affinity of Klf4 for 5mCpG versus CpG. 

The prediction of methyl-CpG binding proteins from 
primary sequences is still challenging. However, to date, 
all structurally characterized methyl-CpG binding 
proteins (except the base-flipping SET and RING finger- 
associated domain proteins) involve a 5mC-Arg-Gua triad 
(35). Here we examined structures of three ZnF proteins in 
complex with modified and unmodified DNA molecules. 
We propose that the presence of a spatially conserved 
(nonconsecutive but spatially proximate) Arg-Glu pair 
in C2H2 ZnF proteins is suggestive of a 5mCpG binding 
preference (see Figure 3). In the cases of Klf4 and Zfp57, 
the Arg-Glu pair is within a single ZnF, with an arginine 
at the —7 or —8 position and a glutamate at —4 (relative to 
the first zinc-binding histidine). In the case of Kaiso, the 
methyl-CpG binding residues come from two neighboring 
ZnF fingers, with an arginine at the —1 position (RH) of 
the N-terminal ZnF and a Glu at the —5 position of the 
C-terminal ZnF. 

Significantly, sequence identity is >65% among 
the three-finger DNA-binding domains of the Sp/Klf 
family (46) (Figure 3h). This conservation implies a 
similar pattern of DNA recognition among the 
family members and suggests that other Sp/Klf proteins 
might be sensitive to DNA methylation status. It is 
interesting that, while the ubiquitous G/C-rich elements 
serve as binding sites for all tested Sp/Klf family 
members, Sp proteins mainly function as transactivators, 
whereas Klf proteins can activate or repress gene expres- 
sion (46). The abihty to bind either unmethylated or 
modified elements may contribute to the diverse regula- 
tory mechanisms of Sp/Klf-mediated gene expression via 
selectively recruiting chromatin cofactors (47). Although 
our knowledge is currently limited to the genome-wide 
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Figure 3. Structural and sequence comparisons of three C2H2 ZnF proteins and their respective DNA interactions, (a and b) The second ZnF of 
Klf4 or Zfp57 recognizes a methylated GCG sequence, (c) Superimposed GCG-recognition hehces of Klf4 (green) and Zfp57 (brown). Arg at 
—7 position of Klf4 and Arg at —8 position of Zfp57 are spatially ahgned. (d) Kaiso uses two neighboring ZnF fingers, an arginine at the —1 
position of the N-terminal ZnF and a Glu at the —5 position of the C-terminal ZnF, to recognize 5mCpG or TpG. (e) In Klf4, together with the Arg 
at the —7 position, the side chain of Glu at the —4 position forms a C-H. . .O type of hydrogen bond with the 5mC methyl group, (f) In Zfp57, 
together with the Arg at the -8 position, the side chain of Glu at the -4 position forms a van der Waals contact with the 5mC methyl group and one 
of its carboxylate oxygen atoms also interacts with the N4 atom of the same 5mC base, (g) Although not ahgned at the primary sequence level, Kaiso 
has spatially conserved Arg and Glu, from two neighboring ZnF fingers (d), forming similar interactions with 5mC as that of Zfp57. (h) Sequence 
alignment of the second ZnF of three-ZnF DNA-binding domains among the mouse Sp/Klf family members. The negatively charged glutamate, at 
the —4 position, appears to be critical in discriminating against the negatively charged carboxylate moiety of 5caC, the final oxidative product of 
5mC. 
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distribution of 5mC and 5hmC during cellular differen- 
tiation, the ability to predict transcription factor sensi- 
tivity to DNA modifications is becoming increasingly 
important. 
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